A Detailed Look at Scale and Translation Invariance in a Hierarchical Neural Model of Visual Object Recognition

نویسندگان

  • Robert Schneider
  • Maximilian Riesenhuber
چکیده

The HMAX model has recently been proposed by Riesenhuber & Poggio [15] as a hierarchical model of positionand size-invariant object recognition in visual cortex. It has also turned out to model successfully a number of other properties of the ventral visual stream (the visual pathway thought to be crucial for object recognition in cortex), and particularly of (view-tuned) neurons in macaque inferotemporal cortex, the brain area at the top of the ventral stream. The original modeling study [15] only used “paperclip” stimuli, as in the corresponding physiology experiment [8], and did not explore systematically how model units’ invariance properties depended on model parameters. In this study, we aimed at a deeper understanding of the inner workings of HMAX and its performance for various parameter settings and “natural” stimulus classes. We examined HMAX responses for different stimulus sizes and positions systematically and found a dependence of model units’ responses on stimulus position for which a quantitative description is offered. Scale invariance properties were found to be dependent on the particular stimulus class used. Moreover, a given view-tuned unit can exhibit substantially different invariance ranges when mapped with different probe stimuli. This has potentially interesting ramifications for experimental studies in which the receptive field of a neuron and its scale invariance properties are usually only mapped with probe objects of a single type. Copyright c ©Massachusetts Institute of Technology, 2002 This report describes research done within the Center for Biological & Computational Learning in the Department of Brain & Cognitive Sciences and in the Artificial Intelligence Laboratory at the Massachusetts Institute of Technology. This research was sponsored by grants from: Office of Naval Research (DARPA) under contract No. N00014-00-1-0907, National Science Foundation (ITR) under contract No. IIS-0085836, National Science Foundation (KDI) under contract No. DMS9872936, and National Science Foundation under contract No. IIS-9800032. Additional support was provided by: AT&T, Central Research Institute of Electric Power Industry, Center for e-Business (MIT), Eastman Kodak Company, DaimlerChrysler AG, Compaq, Honda R&D Co., Ltd., ITRI, Komatsu Ltd., Merrill-Lynch, Mitsubishi Corporation, NEC Fund, Nippon Telegraph & Telephone, Oxygen, Siemens Corporate Research, Inc., Sumitomo Metal Industries, Toyota Motor Corporation, WatchVision Co., Ltd., and The Whitaker Foundation. R.S. is supported by grants from the German National Scholarship Foundation, the German Academic Exchange Service and the State of Bavaria. M.R. is supported by a McDonnell-Pew Award in Cognitive Neuroscience.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating Nonlinearities in the Standard Model of Object Recognition

The Problem: The standard model of object recognition in cortex is a computational model of ventral pathway thought tomediate visual object recognition [5]. It is composed of hierarchical feedforward layers of neuron-like units, performing either (1) template matching with a Gaussian function to increase feature complexity or (2) nonlinear pooling with a maximum operation to increase translatio...

متن کامل

Attention can improve a simple model for object recognition

Object recognition is one of the most important tasks of the visual cortex. Even though it has been closely studied in the field of computer vision and neuroscience, the underlying processes in the visual cortex are not completely understood. A model that lately has gained attention is the HMAX model, which describes a feedforward hierarchical structure. This model shows a degree of scale and t...

متن کامل

Invariant Object Recognition Using Neural Network Ensemble on the CM

ABSI'RACI' This paper concerns machine recognition of objects from their images, where the recognition is invariant to scale, translation, and rotation. A neural network used for recognizing input objects is four layer backpropagation network and a cluster of interconnected units spanning four layers of each network forms a functional block called a column. The 90" rotation invariance has been ...

متن کامل

Title : A Model of V 4 Shape Selectivity and Invariance Authors :

Object recognition in primates is mediated by the ventral visual pathway and is classically described as a feedforward hierarchy of increasingly sophisticated representations. Neurons in macaque monkey area V4, an intermediate stage along the ventral pathway, have been shown to exhibit selectivity to complex boundary conformation and invariance to spatial translation. How could such a represent...

متن کامل

Fast, invariant representation for human action in the visual system

The ability to recognize the actions of others from visual input is essential to humans’ daily lives. The neural computations underlying action recognition, however, are still poorly understood. We use magnetoencephalography (MEG) decoding and a computational model to study action recognition from a novel dataset of well-controlled, naturalistic videos of five actions (run, walk, jump, eat, dri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002